Mention Model for Learning Rules from Incomplete Examples

نویسنده

Mohammad S. Sorower

چکیده

Introduction. We are motivated by the problem of learning rules from naturally available data sources such as natural language texts, web pages, and medical databases. At first, learning rules from natural sources like the web seems to consist of extracting specific facts followed by data mining of rules. Unfortunately, however, there are two major obstacles to fully realizing the dream of unlimited learning of general rules from natural sources. First, natural data sources such as texts and medical histories are radically incomplete in that only a tiny fraction of all true facts are ever mentioned. Perhaps more discouragingly, natural sources are systematically biased in what is mentioned. For example, news stories are biased towards newsworthiness, which correlates with rarity or novelty, sometimes referred as “the man bites dog phenomenon.” In previous work, we introduced the notion of a mention model which models the observation process of an agent generating the data. We showed the effectiveness of an implicit mention model in learning rules by adapting the scoring function which is used to score the hypothesized rules [Doppa et al., 2010]. While implicit mention models are very simple, their usefulness is debatable especially when the mention model is quite complicated. In this work, we propose a generative approach to explicitly model the mention process of data. We propose an iterative EM style algorithm to learn the parameters of our model. We demonstrate the usefulness of the proposed explicit mention model on both synthetic and real-world datasets. Explicit Mention Model. Extracted facts from natural language texts can be typically represented using a set of interrelated predicates. Therefore, it is possible to propose a set of inaccurate rules to predict each predicate from the remaining ones. Our goal then is to construct a probabilistic mention model that captures what facts are mentioned and extracted by the extractor from the text given the true facts about the world. Given some extracted facts, the learning agent inverts the mention model to infer a distribution over sets of true facts. An inductive program can then be used to infer general rules from distributions over true facts. Figure 1 shows the idea of explicit mention model and how the mention observations interacts with the true facts Learning Explicit Mention Model. An iterative approach to learn explicit mention model in an Expectation Maximization (EM) setting is shown in Algorithm 1. We use a relational data mining algorithm called FARMER [Nijssen and Kok, Figure 1: Explicit Mention Model Interaction between facts and the mention observations

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Rules from Incomplete Examples via a Probabilistic Mention Model

We consider the problem of learning rules from natural language text sources. These sources, such as news articles, journal articles, and web texts, are created by a writer to communicate information to a reader, where the writer and reader share substantial domain knowledge. Consequently, the texts tend to be concise and mention the minimum information necessary for the reader to draw the corr...

متن کامل

Learning Rules from Incomplete Examples via Implicit Mention Models

We study the problem of learning general rules from concrete facts extracted from natural data sources such as the newspaper stories and medical histories. Natural data sources present two challenges to automated learning, namely, radical incompleteness and systematic bias. In this paper, we propose an approach that combines simultaneous learning of multiple predictive rules with differential s...

متن کامل

Learning Rules from Incomplete Examples: A Pragmatic Approach

In this paper, we consider the problem of inductively learning rules from specific facts extracted from texts. This problem is challenging due to two reasons. First, natural texts are radically incomplete since there are always too many facts to mention. Second, natural texts are systematically biased towards novelty and surprise, which presents an unrepresentative sample to the learner. Our so...

متن کامل

حمایت از معلولین در حقوق بین الملل

Objective: Disable people need special legal attention. In this regard, special rules have been gradually developed by domestic and international law. Convention for the Protection of the Disabled Disability (2006) in the international community and the Iranian Act of comprehensive protection of the disable people (1383) in a national community are examples of above mentioned legal develop...

متن کامل